PR to fetch Updates from the sim main (don't merge)#529
Open
SaitejaSankoji wants to merge 30 commits into
Open
PR to fetch Updates from the sim main (don't merge)#529SaitejaSankoji wants to merge 30 commits into
SaitejaSankoji wants to merge 30 commits into
Conversation
…rigger faults (#4860) * fix(webhook): don't fault trigger run on user/workflow execution errors Webhook-triggered executions re-threw every error, so trigger.dev marked the run failed and fired #eng-errors alerts. The vast majority of these are user-caused workflow failures (missing required fields, invalid field references, bad URLs, provider 4xx, expired models, low credit) that are already recorded in the execution logs. Distinguish fault vs error in executeWebhookJobInternal: when the failure was finalized by core (the workflow ran and its failure is logged), complete the run with { success: false } instead of throwing. Errors that were not finalized came from the webhook pipeline itself and still re-throw to fault the run. Await waitForPostExecution first so the finalized flag is reliable. The error is still recorded on the run's OTel span via recordException (no ERROR status, so the run isn't faulted) and remains in the execution logs, so these stay investigable in Tempo/Loki without false alerts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedule): don't fault trigger run on error-recovery failures The schedule task already treats workflow-execution failures as recorded errors rather than trigger faults, but the outermost catch's own recovery code (the infra-retry and releaseClaim calls) was unguarded. A secondary DB blip while releasing the claim re-threw and escaped run(), faulting the trigger.dev run and firing an alert — a double-fault during cleanup. Wrap the recovery path in a try/catch: log and record the exception on the span without re-throwing. The claim expires on its TTL and the next tick re-claims the schedule, so swallowing the cleanup failure is safe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(webhook): assert waitForPostExecution runs on the non-finalized path Guards the race fix on the infra-error path so a future refactor can't silently drop the await. Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le storage (#4865) * feat(storage): support S3-compatible endpoints (R2, MinIO, B2) for file storage Add S3_ENDPOINT and S3_FORCE_PATH_STYLE env vars, wired into the shared upload S3 client so Cloudflare R2, MinIO, Backblaze B2, and other S3-compatible stores work for self-hosted file storage. The endpoint is trusted operator config (no SSRF/HTTPS gate). Makes the multipart Location fallback endpoint-aware, extends the S3 client unit tests, and documents the new vars in Helm values, .env.example, and the English self-hosting docs (incl. browser-reachability + CORS guidance). * docs(storage): add RustFS as an S3-compatible provider example * fix(storage): address review feedback and fix env mock for CI - Add envBoolean to the shared env test mock (createEnvMock) so config.ts's forcePathStyle coercion resolves — fixes failing knowledge/utils.test.ts - Declare S3_FORCE_PATH_STYLE as z.string() (every other env var's pattern); it's coerced via envBoolean at the consumption site, avoiding a boolean type that never matches the string process.env value - Log path-style from S3_CONFIG.forcePathStyle (envBoolean) instead of a separate isTruthy call, so the startup log can't disagree with the client - Make buildObjectFallbackUrl honor forcePathStyle: virtual-hosted-style URL (bucket as subdomain) for R2, path-style only when forcePathStyle is set * docs(storage): add backlinks to S3-compatible providers (R2, MinIO, Ceph, B2, RustFS) and backends
…Marketplace guidelines (#4867)
* fix(auth): link SSO sign-in to existing same-email accounts SSO sign-ins failed with "account not linked" (then a cascading "Invalid callbackURL") when an account with the same email already existed. Better Auth's `@better-auth/sso` plugin hardcodes the provisioned user's `emailVerified: options?.trustEmailVerified ? <claim> : false`, so with the option unset every SSO login arrived unverified and tripped the account linking gate `(!isTrustedProvider && !userInfo.emailVerified)` whenever the provider was not in `accountLinking.trustedProviders`. - Set `trustEmailVerified: true` on the SSO plugin so the IdP's verified-email claim is honored (Okta, Entra ID, Google Workspace, Auth0 all assert it). - Trust the operator's configured provider for linking: merge `SSO_PROVIDER_ID` (when present in the app env) plus a new `SSO_TRUSTED_PROVIDER_IDS` list into `trustedProviders`. Empty/unset => no-op, so existing deployments are unchanged. - Invite callback URL: return a clean `/invite/<id>` (token already persists in sessionStorage) so an appended `?error=` cannot produce a malformed URL. - Document `SSO_TRUSTED_PROVIDER_IDS` in SSO docs, Helm values, and schema. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(auth): address review — guard trusted SSO providers, revert invite callback - Only compute additionalTrustedSsoProviders when SSO_ENABLED, so trustedProviders is exactly unchanged for non-SSO deployments. - Revert the invite getCallbackUrl change: keep the token in the callback URL (with sessionStorage/searchParams fallback) so the token survives when sessionStorage is unavailable. The account-linking fix removes the "account not linked" error that caused the malformed callback URL, so the callback cleanup is unnecessary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(auth): guard trusted SSO providers with isSsoEnabled (isTruthy) env.SSO_ENABLED can be the string "false" (t3-env returns strings for booleans), which is truthy in JS. Use the canonical isSsoEnabled flag (isTruthy(env.SSO_ENABLED)) so SSO_ENABLED="false"/"0" correctly yields an empty trusted-provider list, matching how SSO is gated elsewhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(gitlab): sync repository files (code/docs) alongside wiki and issues * fix(gitlab): follow full keyset next-link for repo tree + skip disabled wiki gracefully in all/both * fix(gitlab): error on bad user branch (tree 404), warn on resolveRef fallback, normalize pathPrefix to directory boundary * fix(gitlab): preserve slashes in branch ref for file source URLs (GitFlow branches) * fix(gitlab): never abort sync on repo-tree 404 (empty repo); validate user branch exists at setup instead * fix(gitlab): validate ref via commits endpoint so tags and commit SHAs are accepted, not just branches * fix(gitlab): skip repo phase on tree 403 (missing read_repository) so wiki/issues still sync under all * fix(byok): add Fal icon and repair corrupted Ollama icon path The Ollama BYOK icon rendered blank because its SVG path had spaces stripped between arc-command flags (e.g. `a5.05 5.05 0 12.05-.636`), producing invalid tokens. Replaced with the canonical Ollama path. Also added a dedicated FalIcon (was falling back to the generic ImageIcon) and wired it into the BYOK provider list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(icons): repair corrupted Fireworks icon arc command The leftmost spark of the Fireworks icon never rendered because its third subpath used a corrupted arc command (`a34.59 34.59 0 17.15 37.65`) with collapsed flags, yielding an invalid sweep-flag of 7 that aborts the path parse. Replaced with the canonical lobehub Fireworks source. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…less execution (#4870) * fix(mothership): run client-routed workflow tools server-side in headless execution Headless Mothership (Mothership block, no browser) could not run workflows. The run_workflow/run_workflow_until_block/run_block/run_from_block tools are registered with route 'client', so the executor gate (isSimExecuted) skipped their registered server handlers and fell through to executeAppTool, throwing 'Tool not found'. Interactive runs delegate these to the browser before reaching the executor, so only the headless path broke. Allow a client-routed tool to use its registered server handler when one exists, which only affects the four run tools (the only client-routed tools, all of which have server handlers). * test(mothership): clear handler registry between executor tests Add clearHandlers() helper and reset the module-level handler registry in beforeEach so handlers registered in one test do not leak into the next.
…aks (#4869) * fix(dev): use globalThis for singleton state to prevent HMR memory leaks * fix(dev): apply globalThis guard to rate-limiter storage factory to prevent listener accumulation * fix(types): resolve McpConnectionManager globalThis undefined type error
…sSameOrigin (#4873) * fix(gitlab): pin pagination cursor to configured host before following it The repository-tree keyset cursor stores GitLab's verbatim rel="next" URL and re-fetches it with an Authorization: Bearer header. Assert the cursor's origin matches the configured apiBase before following it, so a tampered or corrupted fileNextUrl cannot exfiltrate the access token to an attacker-controlled host. Fails closed on mismatch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * improvement(validation): generalize isSameOrigin and reuse across connectors/tools Add an optional base argument to the shared isSameOrigin (defaulting to the app base URL) so callers can pin a URL to any trusted origin. The GitLab connector's cursor host-check and the tools self-origin check now consume the shared helper instead of their own URL-parsing. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
) * fix(storage): percent-encode object key in multipart fallback URL buildObjectFallbackUrl built the object URL from a raw key. Keys with spaces or reserved characters (and the pre-existing AWS branch) would produce a structurally invalid location. Encode the key per path segment (preserving '/' separators) across all branches (AWS, custom path-style, virtual-hosted). * refactor(storage): clearer per-segment key encoding in fallback URL * test(storage): cover multipart fallback URL (AWS, R2 virtual-hosted, MinIO path-style, key encoding)
…agnostics) (#4868) * fix(tables): retry transient DB/Redis failures in cell execution and surface error causes Workflow-group-cell runs intermittently failed on trivial DB reads/writes under heavy fan-out, stranding cells in `running`. Investigation showed the PlanetScale and ElastiCache backends were healthy at the time — the failures are transient connection-level faults that the cell (maxAttempts: 1) had no tolerance for, and the real cause was never logged (Drizzle wraps it as "Failed query: ..." and the driver cause lives in error.cause). Resilience: - Add retryTransient (lib/table/retry-transient.ts): retries only transient infra errors (reuses isRetryableInfrastructureError; adds an ioredis command-timeout match) with jittered backoff, then rethrows. Fail-fast for everything else. - Wrap the cell's getTableById/getRowById reads, the terminal write (cell-write updateRow — idempotent via the executionId guard), and the Redis cascade-lock acquire. Diagnostics: - Add describeError (lib/core/errors/retryable-infrastructure.ts): walks the .cause chain and always returns the underlying driver cause (code/errno/ syscall + causeChain), including for unclassified errors like AbortError. - Log `cause` + a `retryable` flag (and aborted/timedOut in the cell's main catch) across the cell + finalization error paths, mirroring the existing schedule-execution pattern. Logging-only; no behavior change. This lets the next recurrence reveal the real cause and whether the retry applies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tables): address review feedback on cell retry resilience - retryTransient: re-check the abort signal after the backoff sleep so a cancellation during sleep stops the next attempt (don't run/return work for an already-cancelled task). - isRetryableRedisError: walk the .cause chain (mirroring the infra classifier) so wrapped Redis timeouts are recognized; drop "Connection is in subscriber mode" — that's a connection-state programming error, not a transient drop, and would just fail identically every retry. - cascade-lock: stop wrapping acquireLock in retryTransient. acquireLock is a non-idempotent SET NX, so retrying after a timed-out-but-applied first SET returns false (key already ours) and yields a false `contended` that skips the cascade. A transient Redis blip here just fails the run before pickup (no stranded cell); the dispatcher re-drives it. - Tests: cause-chain Redis match, subscriber-mode exclusion, abort-during-sleep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tables): drop out-of-scope abort/timeout fields from cell catch The main catch logged `aborted`/`timedOut` from `abortSignal`/`timeoutController`, but those are declared inside the outer try block (the inner try around executeWorkflow is try/finally, so this catch belongs to the outer try) and are not in scope in the catch — `next build`'s type-check failed with "Cannot find name 'abortSignal'". Local incremental `tsc --noEmit` had skipped the file and falsely passed; the Cursor/Greptile reviewers flagged this correctly. Removed the two fields. Abort/timeout is still surfaced via `cause: describeError(err)` (an aborted run shows `name: 'AbortError'` / the timeout message), so no diagnostic signal is lost. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(tables): drop in-process retry, keep cause diagnostics only In-process retry is the wrong layer for this path: the cell task is maxAttempts:1 by design, retrying on a possibly-degraded worker may not help, and it masks the very transient-failure signal we're trying to capture before we understand the root cause. Removed retryTransient entirely (file + all wrapping in cell-write, the cascade reads, and the lock acquire) and kept only the diagnostic logging. - Deleted lib/table/retry-transient.ts (+ test); cell-write and the cascade reads call getTableById/getRowById/updateRow directly again, fail-fast. - Kept describeError + `cause`/`retryable` fields across the cell + finalization catch blocks; the cell-path `retryable` flag now sources from isRetryableInfrastructureError (the canonical classifier) for consistency. Diagnostics-first: surface the real driver cause on the next recurrence, then decide the actual fix (e.g. task-level maxAttempts, or addressing the worker- side cause) from evidence rather than a speculative in-process retry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedules): log error cause on scheduled-execution failure paths The scheduled-job failure paths logged the raw error (.message/stack only) — its `.cause` (the real driver error behind a Drizzle "Failed query: ..." wrapper) was never recorded, and the classified-only `describeRetryableInfrastructureError` returns undefined for unrecognized errors. A real failed run (same incident window as the cell failures) failed in `applyScheduleUpdate` with exactly this unrecorded cause. Added `cause: describeError(error)` (always-on, walks the cause chain) to the applyScheduleUpdate catch, the early-failure catch, and the unhandled-error catch — passed as a second arg so the existing message+stack still emit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(errors): move describeError to @sim/utils/errors `describeError` is a general-purpose error/cause-chain helper — it didn't belong in `lib/core/errors/retryable-infrastructure.ts` (that module is specifically about classifying retryable infra errors, and the name read wrong for a generic diagnostic). Moved it to `@sim/utils/errors` alongside `toError`/ `getErrorMessage`/`getPostgresErrorCode`, with its own cycle-safe cause walk. - Added describeError + DescribedError + tests to packages/utils/src/errors.ts. - Reverted the describeError addition from retryable-infrastructure.ts (it keeps only isRetryableInfrastructureError / describeRetryableInfrastructureError, which are accurately named and still used by the schedule retry path). - Re-pointed all consumers (cell, logging-session, pause-persistence, schedule) to import describeError from @sim/utils/errors. The `retryable` classification flag still sources from isRetryableInfrastructureError where used. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dgebase connector, SSO provider ID allowlist, singleton memory leak fix
) * feat(tables): background import for large CSVs with live progress * fix(tables): address review — import heartbeat, overlap guard, column/empty validation * fix(tables): guard sync import overlap, scope fileKey to workspace, delete-on-replace after download * fix(tables): stream large CSV imports from storage instead of buffering the whole file * test(tables): fix async-import route tests for workspace-scoped fileKey + name uniquification * fix(tables): append imports start after existing rows; reconcile missed import failures in the tray * fix(tables): delete the uploaded CSV from storage after the import finishes * fix(tables): validate replace before deleting rows; ignore stale replayed import events by importId * fix(tables): bind import worker to its importId (no stale-worker clobber/overlap) and destroy storage stream on failure * feat(tables): byte-based import progress, cancel support, and a start toast that opens the import view * fix(tables): don't emit ready after cancel; honor cancel during the upload phase * improvement(tables): use a stop (square) icon for canceling an active import * fix(tables): make markTableImporting an atomic claim to close the concurrent-import TOCTOU race * improvement(tables): preview CSV import from a slice, drop client row-count warning The import dialog parsed the entire file in the browser to show an exact row count and a row-limit warning. That holds the whole file in memory, blocks the main thread, and hits V8's ~512MB string ceiling — so the dialog capped the effective import size well below what the streaming importer handles. Parse only the first 512KB (headers + sample for the mapping); drop the exact count and the "would exceed the row limit by N" gate. The DB row-count trigger already enforces max_rows server-side, so an over-limit import fails fast during the run with a clear message instead of being blocked by an expensive parse. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tables): gate import ownership every batch and stop canceled imports reappearing - Worker checked run ownership only at the progress cadence (~every 5k rows), so a canceled/superseded import could insert several more batches (incl. the final partial batch) before stopping. Move the updateImportProgress ownership gate to the top of every flush — a run that lost the table stops within one batch. - A list/dialog import canceled mid-upload left the server row `importing` until the in-flight server cancel landed; hydration re-seeded it from useTablesList, so the dismissed import flickered back. Flag the real table id canceled on the mid-upload cancel path, skip re-seeding flagged tables in hydration, and clear the flag once the server import is terminal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(tables): drive import tray by polling derived from server, not SSE Import progress no longer holds an SSE connection per importing table. The tray now derives its importing rows live from the table list (React Query), polled only while an import is in flight; the table detail page keeps its own cell-state SSE for grid refresh. - store holds only client-only state now: optimistic uploads, which terminal completions to surface this session, canceled ids, menu open — no copied importStatus/rowsProcessed. - useWorkspaceImports is the single source: polls via a data-predicate refetchInterval, derives rows, and fires completion toasts on the importing -> terminal transition. - kickoff handlers use startUpload/setUploadPercent/endUpload; the invalidated list refetch surfaces the server row and polling takes over. - removes use-hydrate-import-tray + use-import-progress-tracker (folded in). - trims over-verbose comments across the import paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tables): ignore superseded-run import events in the detail SSE cache applyImport applied every replayed import payload to the detail cache. The SSE buffer can replay a prior import's terminal event for the same table, stomping a newer in-flight import's UI. Lock to the active run's importId (and ignore a replayed terminal before the id is known), matching the guard the header tracker used to have. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tables): close sync-import TOCTOU by claiming the atomic import gate The sync import route checked importStatus from a checkAccess snapshot, then parsed/validated/wrote seconds later without taking the atomic claim. A concurrent async kickoff (markTableImporting) could slip into that window and both writers would run together — for replace mode, two delete+insert passes leave the table indeterminate. Claim the same atomic gate (markTableImporting) right before the write and release it in the finally (before the response returns, so a client refetch never sees the transient status). A row-level FOR UPDATE was avoided on purpose: it would invert lock order against the position advisory lock / row-count trigger and risk a deadlock — markTableImporting is the established gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(multipart): keep abort wired after resolve so a mid-upload disconnect tears down the stream readMultipart resolves on the file-part header and hands the caller an un-drained stream, but settle() ran cleanup() and detached the abort listener on that path too. A client disconnect mid-upload then destroyed nothing — busboy never saw EOF, the file stream stalled, and the route's `for await` held a request slot until maxDuration (300s). Re-arm an abort handler scoped to the file stream on resolve, detached when the stream closes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntState verification (#4877) * fix(chat): prevent XSS in attachment preview via filename/data URL escaping Replace document.write with an escaped blob URL preview: HTML-entity encode the user-controlled filename and data URL, open with noopener,noreferrer, and revoke the blob URL after navigation. * fix(mcp): guard OAuth discovery and token revocation against SSRF Route discoverOAuthServerInfo and the RFC 7009 revocation POST through an SSRF-guarded fetch that validates every request URL via validateMcpServerSsrf (blocking private/reserved/loopback targets, honoring ALLOWED_MCP_DOMAINS and self-hosted localhost rules) and pins the connection to the resolved IP to prevent DNS-rebinding TOCTOU. Previously these fetches used unvalidated global fetch against URLs taken verbatim from attacker-controllable authorization-server metadata. * fix(webhooks): verify Graph clientState on Teams chat-subscription notifications The microsoftteams_chat_subscription trigger set clientState=webhook.id when creating the Graph subscription but never validated it on inbound change notifications, so any request to the webhook path with a crafted notification body was treated as authentic (CWE-345). verifyAuth now requires every notification in the value array to carry a clientState matching the stored webhook id (constant-time compare) and rejects payloads without notifications. Validation handshakes (validationToken) are handled before auth and remain unaffected; outgoing-webhook HMAC auth is unchanged. * fix(webhooks): fail closed when Teams chat-subscription webhook id is unavailable Hardens the clientState check so a missing webhook id (theoretically unreachable, since the row is looked up by primary key) can never collapse the expected value to an empty string that a forged clientState could match. * docs(mcp): note AbortSignal does not bound SSRF-guard DNS lookup * improvement(chat): hoist HTML escape map to module-level constant
* fix(mcp): enforce tool name validation in deploy modal * fix(mcp): correct cn import path to fix build * fix(mcp): align tool-name regex with server sanitization, add disabled-combobox hint
… .agents/skills and expand add-model touchpoints (#4882) * chore(skills): mirror model/enrichment/hosted-key/council skills into .agents/skills and expand add-model touchpoints * chore(skills): document council yaml omission and disambiguate validate-model cross-ref
…ool routes (#4884) * fix(polling-tools): pass plan execution timeout to internal polling tool routes * address comments
Reads and writes are fully cut over to the normalized copilot_messages table (verified in production: no writes to the column in 24h, recently-active chats have empty JSONB while copilot_messages holds the transcript). Drop the dead column via drizzle migration 0225 and re-type CopilotChatDetailRow.messages as an assembled (non-column) field. Deploy notes: reconcile any chats where the JSONB still leads copilot_messages before applying, and pg_repack copilot_chats afterward to reclaim the ~5.7GB TOAST storage (DROP COLUMN is metadata-only).
…form, Azure DevOps, YouTube, JSM, S3, Sentry) (#4880) * feat(connectors): add 7 knowledge base connectors (Google Forms, Typeform, Azure DevOps, YouTube, JSM, S3, Sentry) * fix(connectors): tighten listingCapped semantics per review (WIQL cap, batch omissions, cap-vs-exhaustion) * fix(connectors): google-forms listingCapped must fire on slice regardless of hitLimit (404-null-filter gap) * fix(connectors): s3 streaming size cap for chunked responses without content-length * fix(connectors): ado byte-exact file content fetch, google-forms hash-poisoning on listing failure * fix(connectors): ado auth-failure deletion guard, jsm last-page slice flag, google-forms response cap in hash * fix(connectors): shared streaming size-cap reader for ado file hydration (promote from s3) * fix(knowledge): flag incomplete listings at engine level when pagination is truncated * fix(connectors): ado flags listing incomplete when a non-empty repo has no resolvable branch * fix(knowledge): engine truncation flag is an absolute deletion block (fullSync cannot override); s3 byte-exact size fallback; ado tsdoc accuracy * improvement(knowledge): extract shouldReconcileDeletions gate as tested pure function, tighten engine comments * test(connectors): mapTags coverage for the 7 new connectors * fix(connectors): ado probes past the wiql 20k cap before flagging; document custom-wiql full-listing behavior * fix(connectors): ado flags partial repo trees when items listing emits a continuation token * fix(connectors): ado discards foreign-phase cursors; google-forms scans all response pages for change detection * fix(connectors): audit fixes across new connectors - registry: register x connector (was dead code, never wired in) - google-docs/google-drive/google-forms: gate deletion reconciliation on Drive incompleteSearch; google-docs also now sets listingCapped on its maxDocs cap path - jsm: add read:jira-user scope so reporter resolves on requests - gong: only set listingCapped on genuine truncation, not exact-cap source exhaustion - gitlab: issues phase switched to keyset pagination (removes ~50k offset ceiling), matching the repo-tree phase - grain: parallelize recording + transcript fetch in getDocument - ashby: document updatedAt-based content-hash limitation for notes/feedback change detection - tests: mapTags coverage for x, granola, greenhouse, fathom, rootly
…d tools (#4883) * feat(integrations): add ClickHouse block and expand Dagster + Tinybird tools * fix(tinybird): fail loudly on invalid query_pipe parameters JSON parsePipeParameters previously returned {} on any JSON parse error, so a mistyped 'parameters' input produced a successful pipe call with the dynamic filters silently dropped. Throw a clear error for non-empty, non-object input instead; an omitted/empty value still means 'no parameters'. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(dagster): guard NaN numeric coercions and bound list_assets pagination Address PR review: - Route all block numeric coercions (list_runs limit/createdAfter/createdBefore, get_run_logs logsLimit, list_assets assetsLimit) through a toFiniteNumber() guard so invalid/wand-generated text becomes undefined instead of NaN. - list_assets now applies a default page size (100) when no limit is given, so paging stays bounded and hasMore is meaningful even when limit is omitted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(dagster): make list_assets hasMore exact via fetch N+1 Address PR review (hasMore true on exact page): request one extra row (pageSize + 1), use its presence as the authoritative hasMore, slice it off, and derive the returned cursor from the last RETURNED asset's key path (JSON-serialized; Dagster normalizes JS/Python whitespace on the way in). This removes the false-positive hasMore when the final page is exactly full. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(clickhouse): enforce read-only query operation and harden WHERE-clause guard * fix(dagster): make list_runs hasMore exact via fetch N+1 Address PR review (list runs false hasMore): request one extra row (pageSize + 1), use its presence as the authoritative hasMore, and slice it off before mapping. Removes the false-positive hasMore (and misleading cursor) when the final page is exactly `limit` runs long. Mirrors the list_assets fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(clickhouse): restrict DROP PARTITION to literal values to prevent SQL injection * fix(clickhouse): reject chained statements in read-only query operation * fix(clickhouse): force JSON output on query path and ignore comments when detecting chained statements * fix(tinybird): encode datasource/pipe names in URL paths to prevent traversal A user-or-llm datasource/pipe name interpolated raw into the URL path (e.g. 'real_ds/../../other') is normalized by the WHATWG URL parser and can target a different endpoint. Wrap the path segment with encodeURIComponent in the truncate, delete, and query_pipe URLs. Events/append pass the name via URLSearchParams, which already encodes, so they were unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(clickhouse): block WITH-led writes/DDL in read-only query operation * fix(clickhouse): validate column types structurally and normalize FORMAT around SETTINGS * fix(clickhouse): balance-check ORDER BY/PARTITION BY and skip leading comments in read-only guard --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(autolayout): relocate notes that overlap blocks after layout * fix(autolayout): harden note overlap resolution against resize and non-finite positions
* feat(metrics): emit hosted-key metrics to Grafana via OTel Replace the dropped platform.hosted_key.* spans with OTel counters/histograms for usage, cost, failures, throttles, and queue waits. Wire a MeterProvider into the Next.js OTel SDK (trigger.dev already exports metrics). Per-key attribution via a key label (env var name). * fix(metrics): correct hosted-key failure attribution - Re-point used/cost/failed labels at the freshly acquired key after reacquire - Classify quota-style 401/403 as rate_limited (mirror isRateLimitError) - Count returned success:false runs (e.g. deep_research polling) as failed * fix(metrics): label hosted_key.throttled with real provider on exhausted retries * fix(metrics): parse OTLP metrics URL via URL/pathname, not string suffix Handles query strings and trailing slashes so the /v1/traces->/v1/metrics swap can't produce a malformed endpoint, matching normalizeOtlpTracesUrl.
- Route both row-GET endpoints (internal + v1) and the copilot tool through the single service.queryRows instead of three inline query copies; add a withExecutions option so the public v1 route still omits executions. - Run COUNT(*) and the page fetch concurrently in queryRows. - Move CSV-import transaction ownership out of the API route into importAppendRows / importReplaceRows so routes never hold a trx. - Extract row position mechanics (reserve / shift / compact) into named private helpers in service.ts; no separate table-wrapper module.
…d/no-output badges (#4889) * feat(tables): workflow version selection (live/deployed) and not-found/no-output badges * fix(tables): draw row-selection left edge as checkbox cell border so it cannot be cut off * fix(tables): per-group version in cascade, accurate deploy error, skip not-found for deployed groups * fix(tables): render selection left edge as continuous strip overlapping row gridlines * feat(tables): not-found column icon, optional workflow inputs, mothership deploymentMode --------- Co-authored-by: waleed <walif6@gmail.com>
All app replicas shared a hardcoded service.instance.id ("mothership-sim"),
so OTel metrics from every process collapsed into one Prometheus series.
Their independent cumulative counters then interleaved, producing phantom
counter resets that corrupt rate()/increase() — staging hosted-key cost
inflated to ~$0.72 from a few cents, while no-`key` metrics (cost_charged,
throttled, queue_wait_*) were affected fleet-wide.
Append the hostname (the container id under ECS, unique per task) so each
replica gets its own series and sum(rate(...)) / sum(increase(...)) aggregate
correctly. The mothership-sim prefix is kept so Jaeger's clock-skew adjuster
still separates Sim from Go.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lag-gated, default off) (#4890) * feat(tables): add order_key column, fractional-indexing util, and ordering flag (off) * feat(tables): write order_key on insert, flag-gate delete reindex + query ordering, add backfill Flag off (default) = identical behavior. Single-insert assigns a fractional order_key; queryRows orders by order_key when the flag is on; deletes skip the O(N) reindex when on. Per-table-atomic backfill script populates existing rows. * feat(tables): write order_key on all insert paths (batch, upsert, replace, import, create, copilot) Completes the always-write-keys prerequisite: every row insert now assigns a fractional order_key consistent with position order, so the flag can be flipped safely after backfill. Flag off (default) still = identical behavior. * feat(tables): insert-by-neighbor-id + orderKey on wire + client order-by-key Inserts express intent as afterRowId/beforeRowId (O(1) key mint via the (table_id,order_key,id) index); orderKey is returned on every row; client reconcile/undo place by orderKey (no neighbor bump) with position fallback. Flag off = unchanged. 205 table tests pass. * feat(tables): resolve position-based inserts by key ordinal under the flag Position-based callers (mothership tool, v1 API, undo fallback, transient old clients) resolve their insert neighbor by order_key ordinal (OFFSET) when the flag is on — positions are gappy then, so WHERE position=N would miss. Flag off keeps the indexed position lookup. The mothership tool itself is unchanged. * test(tables): flag-on coverage — delete skips reindex, insert mints key + no shift * fix lint * chore(db): regenerate order_key migration with default drizzle name * fix(tables): address review — guard neighbor insert + mutual-exclusion + safe reconcile - resolveInsertByNeighbor throws when the anchor row is missing (was silently inserting at the front) and when its order_key is null under the flag. - insert contract: afterRowId/beforeRowId are mutually exclusive (refine). - reconcileCreatedRow only key-sorts when every cached row is keyed, so mid- backfill un-keyed rows aren't yanked to the front. * fix(kb): restore non-null guard in storage-key filter (unsafe-lint regression) * refactor(tables): extract maxOrderKey + thread import append key - Extract maxOrderKey(executor, tableId) helper; replaces three identical max(order_key) selects (single/batch insert append + import). - Import: read the append anchor once up front and thread each batch's last key forward (nextImportStartOrderKey + afterOrderKey) instead of re-scanning max(order_key) per batch over a growing table — one scan per import, not one per 1k-row batch. * fix(tables): keep insert body base omittable for v1 contract The afterRowId/beforeRowId mutual-exclusion .refine() turned the schema into a ZodEffects, which Zod forbids .omit() on — v1's insertTableRowBodySchema.omit({ position }) threw at module load (runtime-only; tsc misses it). Split the plain object base out, apply the shared refine on top, and have v1 omit from the base then re-apply it. * fix(tables): chunk backfill order-key writes A single UPDATE … FROM (VALUES …) over a whole large table overflows the JS call stack while drizzle assembles the VALUES list (and would blow past Postgres's 65535 bound-param limit at ~32k rows) — large tables failed with 'Maximum call stack size exceeded'. Write in 1000-row chunks inside the same per-table transaction so keying stays atomic. * fix(tables): emit orderKey in insert responses The single-row and batch insert handlers dropped orderKey from the JSON response even though the service returns it, so reconcileCreatedRow always fell back to position-sorting and could place neighbor inserts wrong under the fractional-ordering flag. Serialize orderKey alongside position. * fix(tables): restore by orderKey, not position, under fractional flag A saved position is the gappy column value, but under the flag insert reads position as a visual rank (OFFSET) — so position-based restore misplaces rows. - create-row redo now goes through the batch path carrying the saved orderKey (the single-insert API has no orderKey field); drop the now-unused single create mutation. - resolveBatchInsertOrderKeys appends under the flag instead of feeding gappy positions to resolveInsertOrderKey; positions remain the flag-off path. * perf(tables): backfill writes 5000 rows/chunk (was 1000) 5x fewer round-trips per table; ~10k bound params stays well under Postgres's 65535 ceiling and far below the single-statement size that overflows the stack. * fix(tables): drop rowNumber from table trigger payload position is gappy under the fractional-ordering flag, so rowNumber (= row.position) no longer reflects a contiguous visual rank. Rather than compute-on-read, remove it from the trigger payload, output schema, and column-execution input. Also pin isTablesFractionalOrderingEnabled=false in update-row.test.ts so its flag-off position-shift assertions are deterministic regardless of local env. * chore(db): format generated 0226 migration metadata biome check . flagged the drizzle-generated _journal.json and 0226_snapshot.json; apply the formatter so packages/db lint:check passes in CI. * docs(triggers): drop rowNumber from table trigger outputs rowNumber was removed from the table trigger payload; remove it from the documented output fields to match. * test(tables): remove flag-on fractional-ordering unit suite Flag-on behavior is covered by manual large-table verification; the heavily- mocked DB-chain suite added little signal.
…ERE-clause validation (#4895) * fix(clickhouse): centralize WHERE-clause validation in input-validation and harden tautology detection * fix(clickhouse): enforce server-side readonly=1 on the query operation * fix(clickhouse): allow BETWEEN bounds in WHERE validation (OR-only literal rule) and dedupe JSDoc
…rs, clickhouse integration
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brief description of what this PR does and why.
Fixes #(issue)
Type of Change
Testing
How has this been tested? What should reviewers focus on?
Checklist
Screenshots/Videos